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ABSTRACT 

FungiDB (http://FungiDB.org) is a functional 
genomic resource for pan-fungal genomes that 
was developed in partnership with the Eukaryotic 
Pathogen Bioinformatic resource center (http:// 
EuPathDB.org). FungiDB uses the same infrastruc- 
ture and user interface as EuPathDB, which allows 
for sophisticated and integrated searches to be 
performed using an intuitive graphical system. 
The current release of FungiDB contains genome 
sequence and annotation from 18 species spanning 
several fungal classes, including the Ascomycota 
classes, Eurotiomycetes, Sordariomycetes, 
Saccharomycetes and the Basidiomycota orders, 
Pucciniomycetes and Tremellomycetes, and the 
basal 'Zygomycete' lineage Mucormycotina. 
Additionally, FungiDB contains cell cycle microarray 
data, hyphal growth RNA-sequence data and yeast 
two hybrid interaction data. The underlying genomic 
sequence and annotation combined with functional 
data, additional data from the FungiDB stand- 
ard analysis pipeline and the ability to leverage 
orthology provides a powerful resource for in silico 
experimentation. 

INTRODUCTION 

The recent dramatic increase in the number and scale of 
genome sequence and functional genomic data (i.e. prote- 
omic, microarray, RNA-sequence, ChlP-ChIP, etc.) has 
made it increasingly challenging for scientists to navigate 



through the miheu of data. Importantly, it has become 
essential to be able to interrogate data sets from 
multiple genomes in an integrated fashion. To this end, 
FungiDB (http://fungidb.org) was developed as a resource 
for genomic and functional genomic data across the 
fungal kingdom. 

FungiDB was developed in partnership with the 
NIAID-funded Eukaryotic Pathogen Bioinformatic 
Resource Center (http://eupathdb.org) (1). As such this 
resource uses the same database structural framework 
and employs the graphical strategies Web Development 
Kit (WDK) search interface (2). Current genomes 
in FungiDB are primarily obtained via the Broad 
Institute (http://broadinstitute.org) and the dedicated 
Aspergillus, Candida and Saccharomyces Genome 
resources (3-5). 

FungiDB provides a data-mining interface to the com- 
parative and functional genomic data of multiple species 
of fungi that differs from the species-focused resources of 
SGD, CGD and AspGD and provides an integrated query 
system as part of the WDK and GUS database structure. 
FungiDB differs from other resources such as Ensembl 
Fungi (6), the Joint Genome Institute's Mycocosm 
(http://jgi.doe.gov/fungi) and IMG tools (7) or Microbes 
Online (8), which provide complementary data query or 
visualization tools but do not have the data mining 
capabilities and broad cross-species comparisons that are 
possible with the WDK search interface. 

Data are currently obtained directly from providers at 
sequencing centers, GenBank and associated functional 
data repositories (GEO or SRA), or the key model 
organism databases SGD, CGD and AspGD. The initial 
1.0-P release focused on sets of genomes of a cluster of 
Aspergillus species, and key species available from the 
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Basidiomycota primarily provided by the Broad Institute 
or the Joint Genome Institute. 



(29) and synteny prediction. Pipeline details are available 
at http://s. fungidb.org/pipeline_methods. 



GENOMES AND DATA IN THE CURRENT RELEASE 
OF FUNGIDB 

The current release of FungiDB contains genome 
sequence and annotation from 17 annotated fungal 
genomes {Aspergillus clavatiis, A. flavus, A. fumigalus, 
A. nidulans, A. niger, A. terreus, Candida albicans, 
Coccidioides immitis (strain RS), Cryptococcus 
neoformans, Fusarium graminearum, F. oxysporum, 
Fusarium verticillioides (Gibberella moniliformis), Magna- 
porthe oryzae, Neurospora crassa, Puccinia graminis, 
Rhizopus oryzae and Saccharomyces cerevisiae) and one 
unannotated strain (H538.4) of C. immitis, providing 
650 Mb of sequence and 207 1 1 1 genes. Specific details 
about these genomes are available in Table 1 . 

FungiDB also contains several functional genomic data 
sets: (i) EST data obtained from dbEST (http://www.ncbi. 
nlm.nih.gov/dbEST/) (23) for A. flavus, A. terreus, A. niger 
and G. moniliformis; (ii) cell cycle microarray data for 5*. 
scerevisiae based on different synchronization methods 
(24, 25); (iii) RNA-sequence data from R. oryzae during 
hyphal growth (Stajich, J.E., Sain, D., Abramyan, J., un- 
pubhshed data); and (iv) yeast two hybrid data from 
S. cerevisiae (5). 

Genomes and annotation in FungiDB are processed 
through the same analysis pipeline, which provides 
additional data, including InterPro domains (26), gene 
ontology term association (27), signal peptide predictions 
(28), transmembrane domain predictions, open reading 
frame predictions, BLAST against the non-redundant 
genome database at the National Center for Bioinform- 
atics (NCBI), orthology prediction based on OrthoMCL 



HOW TO USE FUNGIDB 

The home page 

The FungiDB web interface is designed to provide the 
user with convenient and straightforward access to the 
underlying data (Figure 1). The home page is divided 
into three main sections, the banner (Figure lA), informa- 
tion and help menus (Figure IB) and searches and tools 
(Figure IC). The banner section appears on all FungiDB 
pages providing users with quick access to GenelD and 
text searches, 'contact us' form, registration/login and 
information from any page on the site. Creating an 
account and logging in allows search strategies to be 
saved and shared, gene hsts to be saved, and to create 
gene associated annotation comments that are attributed 
to the author. The gray tool bar section of the banner 
consists of a series of mouse-over menus that link to all 
tools and searches in this resource. The expandable infor- 
mation and help menus on the left-hand side of the 
home page (Figure IB) provide access to a data 
summary table of all data in FungiDB, a news section, 
community resources (including useful links and 
upcoming events), web tutorials and additional help and 
information. The central section contains links to all 
searches and tools in FungiDB, and clicking on the plus 
symbols (arrow in Figure IC) expands the various 
categories to reveal the underlying searches (Figure IC). 
These include searches that return sets of genes 
(left column of Figure IC), searches that return other 
data types such as expressed sequence tags (ESTs), open 
reading frames (ORFs), genomic segments (these include 
DNA motifs) and genomic sequences (i.e. scaffolds and 
chromosomes) (central column of Figure IC) and tools 



Table 1. Species in ttie current release of FungiDB 



Phylum 


(sub)Class 


Species 


Strain 


Genome 
size (Mbs) 


No. of 

genes 


Ref 


Ascomycota 


Eurotiotnycetes 


Aspergillus clavatus 


NRRLl 


27.86 


9413 


(9) 






Aspergillus flavus 


NRRL3357 


36.79 


12730 


NA 






Aspergillus fumigatus 


Af293 


29.39 


10067 


(10) 






Aspergillus nidulans 


FGSC A4 


30.5 


10865 


(11) 






Aspergillus niger 


ATCC 1015 


37.20 


6679 


(12) 






Aspergillus terreus 


NIH 2624 


29.33 


10564 


NA 






Coccidioides immitis 


H538.4 


27.73 


10640 


(13) 






Coccidioides immitis 


RS 


28.95 


9878 


(14) 




Sordariomycetes 


Fusarium graminearum 


PH-1 (NRRL 31084) 


36.45 


13605 


(15) 






Fusarium oxysporum 


f.sp.4287 


61.4 


17466 


(16) 






Fusarium verticillioides (GihhereUa moniliformis) 


7600 


41.78 


14457 


(16) 






Magnaporthe oryzae (Magnaporthe grisea) 


70-15 


41.70 


11385 


(17) 






Neurospora crassa 


OR74A 


41.04 


10154 


(18) 




Saccharomyce tes 


Candida albicans 


SC5314 


14.32 


6444 


(19) 






Saccharomyces cerevisiae 


S288c 


12.16 


6926 


(20) 


Basidiomycota 


Piicciniomycetes 


Puccinia graminis f.sp. triiici 


CRL 75-36-700-3 


88.65 


21065 


(21) 




Tremellomycetes 


Cryptococcus neoformans var. gruhii 


H99 


18.88 


7121 


NA 


Mucormycotina 




Rhizopus oryzae {Rhizopus delemar) 


RA 99-880 


46.09 


17652 


(22) 


(Zygomycota) 















All genomes were downloaded from their sources in January 2011. Additional information is available at http://s.fungidb.org/GenomeDataTypes. 
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Figure 1. Screen shot of the FungiDB home page. (A) The banner section present on all FungiDB webpages provides links to registration, login, and 
contact us forms, ID and text searches, information and help, and all available searches and tools. (B) The side bar provides expandable tabs with 
information such as a data summary table, community events, news, tutorials and help links. (C) The central portion of the home page contains three 
sections, the left column contains searches that return genes, the middle column contains searches that return other entities such as genomic sequence, 
ESTs, DNA motifs and the third (right column) section contains tools such as BLAST, sequence retrieval and web services. The plus symbols can be 
expanded to reveal specific searches (upward pointing red arrow). Green insert is an example search page following selection of a 'genes by species' 
search. 



in the right-hand column of Figure IC, including access to 
the GMOD genome browser (30), BLAST (31), the 
sequence retrieval tool, access to web services searches 
and a list of fungal-related hterature (based on PubMed 
searches). 

Performing a search in FungiDB 

A search in FungiDB is initiated by selecting a search 
type, which opens a new page with additional parameters 
that can be set. For example, selecting a 'genes by species 
search' opens a new page where a user can select which 
organisms to include in the search (Figure IC insert, green 
box). Once a search is executed, results are displayed as 
the first step in a search strategy (see building a search 
strategy below). In the example shown in Figure 2, a 
search for all genes in all species in FungiDB returns 
207 1 1 1 genes (note the number of genes is displayed 
in the first step of the strategy, yellow box Figure 2A). 
The distribution of the gene results from step 1 among 
the various species in FungiDB is displayed in a filter 
table below the strategy (Figure 2B). A user can toggle 
between the various species-specific results by selecting 
the number in the desired cell. The actual results list of 
what is selected (yellow highlighted step and cell) is 
displayed in a dynamic table below the filter table 
(Figure 2C). Users may navigate through the results by 
selecting specific page numbers, clicking on next/previous 
(or first/last), and choosing the number of results to 



display /page by selecting 'advanced paging'. In addition, 
columns may be added to the results table by clicking on 
'select columns' and choosing which columns to display 
(green insert, Figure 2C). The columns in the results table 
may be removed (click on the 'x' to the right of a column 
title), moved (drag and drop) or sorted (clicking on the 
up/down arrows to the left of a column title). Individual 
items in the results table may be added to the basket (see 
below) by clicking on the basket icon (arrow, Figure 2C). 
Also, individual records (for example gene pages) may be 
accessed by clicking on the item ID in the first column of 
each result fist (Figure 2D). 



BUILDING A SEARCH STRATEGY IN FUNGIDB 

The search strategy system implemented in FungiDB (1) is 
designed to encourage users to run in silico experiments. 
After running the initial search, the search strategy may be 
expanded by clicking on add step (red button, Figure 2A). 
Once the add step button is engaged, a popup (Figure 2A, 
green box) of all searches in FungiDB is revealed and the 
results of any chosen search may be combined with 
the results of the previous search using a set operation 
(intersect, union or minus). Steps in a strategy may be 
viewed, revised, renamed and developed further by 
nesting or deleted. Furthermore, entire search strategies 
may be renamed, copied, saved and shared with a 
unique strategy URL or deleted. 
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Figure 2. (A) Screen shot of a search strategy with the first step being the resuhs of running the search shown in Figure IC (inset). Engaging the 
'Add Step' button reveals a popup with all available searches in FungiDB. Set operations (intersect, union and minus) are used to combine searches 
with each other in a search strategy (green box). (B) The filter table appears below the search strategy and shows results across all species in 
FungiDB. Selecting any of the cells will filter the results and show them in the results table shown in C. (C) The results of any search are displayed in 
a dynamic table that allows removing (click on the 'x' next to the column name), adding (green box) and moving (drag and drop) columns, 
downloading results and adding results to the basket (red upward pointing arrow). (D) Selecting any of the record IDs in the first column opens 
a record page (the gene page is shown in D). The gene page includes sections including the gene ID and gene product name, genomic context 
showing synteny with related species in FungiDB, annotation with links to the 'User Comment' form (shown in E), protein information (i.e. interpro 
domains, hydropathy plots, signal peptide prediction, etc.), expression data (where available) and the sequence. (E) The user comment form allows 
comments by the community including free text, PubMed IDs, GenBank accession numbers, attaching images and files, and linking comments to 
multiple records. (F) A genome browser view in FungiDB accessible from the genomic context view in the gene page or the tools section. Tracks 
can be loaded in the genome browser such as synteny as shown in F and RNA sequence coverage plots as seen in (G). 



The multi-step search strategy in Figure 3 defines 
potential drug targets in C. neoformans by identifying 
enzymes that may be associated with growth using 
cross-species comparisons. Step 1 of the strategy identifies 
all genes with an enzyme commission (EC) number (avail- 
able data in FungiDB is from 5. cerevisiae) to start with 
a set of known enzymes. Step 2 combines the results of 
Step 1 with all genes in FungiDB that have an associated 
metabolic process gene ontology (GO) term (automatical- 
ly assigned to all genomes in FungiDB as part of the 
standard analysis pipehne) using a union operation to 
identify additional potential enzymes with this GO anno- 
tations. Step 3 transforms the results from Step 2 into their 
orthologs in all fungal organisms in FungiDB into 
C. neoformans genes. In Step 4, the results are intersected 



with genes that are upregulated during hyphal growth in 
R. oryzae [note that this step is actually a nested strategy 
(Figure 3B) where the R. oryzae results are transformed 
into their FungiDB orthologs] to find genes that are likely 
involved during active growth of the fungus and its cell 
waU. Since an actively growing RNA-Seq data set was not 
available for C. neoformans, one can use the R. oryzae set 
to filter the genes. Step 5 is a result of intersecting Step 4 
with all genes in FungiDB that do not have orthologs in 
bacteria, archeae and non-fungal eukaryotes (based on 
OrthoMCL results) in order to find genes which are 
unique to Fungi. The final hst of results is then filtered 
on C. neoformans using the filter table as described above 
to reveal 32 genes. While clearly a hypothetical list of po- 
tential drug targets, it is intriguing to note that several 
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Figure 3. Screen shot of a multi-step search strategy in FungiDB. (A) The main body of the search strategy — Step 1 asks for all genes in FungiDB 
annotated with an EC number. Step 2 uses a union operation to combine results from Step 1 with all genes with 'metabolic process' genome ontology 
association, Step 3 transforms the results from Step 2 into their orthologs in all species in FungiDB, Step 4 intersects results from Step 3 with 
a substrategy (B), Step 5 intersects the results of Step 4 with results from an orthology phylogenetic pattern search that asks for all genes in FungiDB 
that do not have orthologs in mammals. A filter is applied (see Figure 2B) to the results in Step 5 to reveal only C. ncoformans genes. (B) An 
expanded view of the substrategy that was coinbined with Step 3 in A, which defines all orthologs of genes in R. oryzae that are upregulated in 
hyphae based on RNA sequence data. Saved strategies may be shared with others using a unique strategy URL shown in the inset (http://fungidb. 
org/fungidb/im.do?s = c485b8fe6el769bd). 



known drug targets appear in the list including, the glucan 
1,3 P-glucosidase gene {S. cerevisiae FKSl homolog), 
which is the target of echinocandins, and an 
oxysterol-binding protein involved in ergosterol synthesis 
(target of amphotericin B). This strategy maybe viewed 
and shared using the following URL: http://fungidb. 
org/fungidb/im.do?s = c485b8fe6el769bd. 

ADDITIONAL FEATURES IN FUNGIDB 

Favorites 

The favorites tool allows users of FungiDB to bookmark 
their favorite genes for quick future access. Adding or 
removing a gene to the favorites can be done by clicking 
on the favorites icon (star) on gene pages (Figure 2D). 
Accessing the favorites page is achieved via the favorites 
link in the gray tool bar (Figure lA). Genes in the favor- 
ites page can be assigned to user-defined projects and 
free text can be added to each gene. 

Basket 

The basket tool allows a user to cherry pick individual 
results (i.e. genes, ESTs and genomic sequences) and 
place them in the basket for further analysis. Adding 
items to the basket can be achieved by clicking on the 
basket icon in a list of results (Figure 2C) or at the top 
of a record page (Figure 2D). Once a desired set of items 
have been added to the basket, a user may add the basket 
contents to a search strategy and analyze the results by 
combining with other data in FungiDB. 

Weighted searches 

Weighting allows a user to add arbitrary weights to steps 
in a strategy. As the strategy grows, the results are sorted 
by the sum of their weights. For this feature to function 



properly, steps need to be combined using the union 
operation. The benefit of using this feature is that items 
that do not meet all the criteria in a strategy are not lost 
but rather appear in the final result hst, albeit ranked 
lower down the hst. As an example, the strategy described 
in Figure 3A was weighted (each step was weighted on 
a scale from 1 to 10): details are available at 
http://fungidb.org/fungidb/im.do7s = 3921e20d384bd503. 

Although the final result list expands dramatically 
(as expected since the union operation was used to 
combine steps), it is returned as a ranked list. Additional 
high-interest genes that appear as top hits in the weighted 
search include sulfite reductase. 

Genomic colocation 

This tool enables users to identify genomic features based 
on their relative location to each other on the genome. 
For example, a user can identify a DNA motif in their 
favorite fungal genome and then find genes that have 
one of these motifs within 1000 bp of their 5'-end. One 
can then ask questions about this hst of motif containing 
genes such as when they are expressed or what GO terms 
are associated with them. 

User comments 

Comments by users may be added to record pages (such as 
gene pages) in FungiDB by clicking on the add comment 
hnk (Figure 2D and E). The comment form provides 
a user with a quick and straightforward mechanism to 
enhance FungiDB with their information, which may 
include free text, references (by entering PubMed or 
digital object IDs), NCBI accession numbers, images 
and documents, and genomic location coordinates. 
Moreover, a comment may be linked to multiple records 
in the database using their IDs. Once a user submits a 
comment it appears immediately for all users on the 
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record page and becomes searchable via the text search 
functionaUty. 



DATA DOWNLOAD AND SEQUENCE RETRIEVAL 

Data in FungiDB are conveniently available for bulk 
download from the 'Data Files' section accessible from 
the 'Downloads' menu item in the gray tool bar 
(Figure lA). Data files are in folders organized by 
database release version number and species. Within 
each species-specific directory, data may be downloaded 
in FASTA, text or GFF formats. 

The sequence retrieval tool, accessible from the tools 
section (Figure IC) aUows users to specify exact coordin- 
ates to be downloaded. Additionally, results of searches 
maybe downloaded in bulk and in a defined manner. 
For example, one may choose to download 300-nt 
upstream of the translation start site of a set of genes in 
a search strategy to aUow dumping of promoter sequences 
for the genes that resulted from a query. 



FUTURE DIRECTIONS 

FungiDB is expected to dramatically expand over the next 
few years to include >100 genome sequences and annota- 
tion. This will fill more phylogenetic diversity and cover 
additional plant and animal pathogenic fungi to aUow for 
comparisons of common genomic traits among related 
and independent origins of pathogenecity. Importantly, 
the functional data portion is expected to accumulate 
multiple data sets from high-throughput proteoniics (32), 
transcriptomics and metabolomics (33) studies to enable 
data mining and querying within a species data that can 
connect genes to function. The housing of many sets of 
functional genomics information will aUow data to be 
appHed to those organisms with less experimental data 
through comparative analyses thus assisting informing 
potential gene function in less tractable study organisms. 

The work underway to further develop FungiDB is 
focused on enabhng a 'franchise model' of the 
EuPathDB system where the site runs entirely independ- 
ently from the present core EuPathDB system. This wiU 
greatly simplify the efforts needed to deploy instances of 
EuPathDB for groups of species as independent instances 
of the database and website since the tools are general 
purpose and the WDK search interfaces useful to a 
variety of groups of organisms. This will provide more 
generic installation of the software that runs EuPathDB 
to support clade-specific databases and websites such as 
FungiDB. Future work will also provide standalone 
instahation documentation to instruct on best practices 
to deploy, configure and load data into the system. 
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